Forecasting non-stationary financial time series through genetic algorithm 
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We utilize a recently developed genetic algorithm, in conjunction with discrete wavelets, for 
carrying out successful forecasts of the trend in financial time series, that includes the NASDAQ 
composite index. Discrete wavelets isolate the local, small scale variations in these non-stationary 
time series, after which the genetic algorithm's predictions are found to be quite accurate. The 
power law behavior in Fourier domain reveals an underlying self-affine dynamical behavior, well 
captured by the algorithm, in the form of an analytic equation. Remarkably, the same equation 
captures the trend of the Bombay stock exchange composite index quite well. 

PACS numbers: 05.45.Tp, 89.90.+n, 89.65.Gh, 05.45.-a, 07.05.Mh 
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It is well-known that a time series, which looks ran- 
dom in nature, may in fact be the outcome of a nonlin- 
ear deterministic but chaotic dynamics involving a few 
degrees of freedom. In such cases, it is possible to exploit 
this determinism to make short-term forecasts. Finan- 
cial time series, originating from complex dynamical pro- 
cesses, are known to exhibit different types of behavior 
at different time scales yj, . Random processes like 
geometric Brownian motion |j| and fractional Brownian 
motion [f| have been invoked for modelling stock mar- 
ket behavior. Fluctuations in the asset prices have also 
been analyzed through Levy-stable non-Gaussian model 
0, a mixture of Gaussian distributions 0, etc. Sep- 
aration of the fluctuations at short time scales, owing 
their origin to random processes, is essential before at- 
tempting any forecast, based on deterministic dynam- 
ics. This is made complicated due to the fact that, stock 
market composite indices often show non-stationary be- 
havior [1,0. Wavelets, because of their multi-resolution 
capability and time-frequency localization are ideal to 
separate out the fluctuations at different time scales in 
non-stationary time series |lfj| . 

The goal of the present paper is to make use of discrete 
wavelets to isolate the fluctuations, at different scales, 
in non-stationary financial time series data and subse- 
quently employ a recently developed genetic algorithm 
for short-term predictions of the trend 
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have used NASDAQ and Bombay Stock Exchange (BSE) 
composite indices for the purpose of our analysis. 

We begin our work with reconstruction of dynamics 
in phase space from a time series. Theoretical ideas un- 
derlying this reconstruction are by now well-known and 
contained in the works of Ruelle [3 , Packard et al. yj| , 
and Takens 0|. Thus, given a deterministic time series 
x(tk),tk — kAt,k = 1, ...,N, there exists a smooth map 
P satisfying 

x(t) = P [x(t - At),x(t - 2At), ...,x(t - mAt)} , (1) 



where m is called the embedding dimension and At is the 
sampling time interval. 

A genetic algorithm tries to obtain the function P, that 
best represents the map of a chaotic or integrable time 
series. The map can then be used to predict the future 
state of the system. 

Genetic Algorithm. — The genetic algorithm considers 
an initial population of potential solutions consisting of 
elementary equation strings. These equation strings (in- 
dividual solutions) are of the type as given in Eq.JIJ. 
Their right hand sides are stored in the computer as sets 
of character strings that contain random sequences of the 
variable at previous times, the four basic arithmetic sym- 
bols (+, -, X, and /), and real number constants. A crite- 
rion that measures how well the equation strings perform 
on a training set of the data is its fitness to the data, de- 
fined in Eq.(Q in the text. The strongest strings choose 
a mate for reproduction whereas the weaker strings be- 
come extinct. The newly generated population is sub- 
jected to mutations that change fractions of information. 
The evolutionary steps are repeated with the new gen- 
eration. The process ends after a number of generation 
apriori defined by the user. 

This technique is first applied to the time series of the 
NASDAQ composite index, in the region which shows 
maximum activity (897 points corresponding to the pe- 
riod between 18 th February 1999 to 12 th September 
2002). A population consisting of 200 members was sub- 
jected to the iterative algorithm for carrying out one day 
ahead forecast. However, it was found that the forecast 
obtained from genetic algorithm was only marginally bet- 
ter than persistence forecast, Xt+i = x*. It was thus felt 
that it would be better if one attempts to forecast the 
trend of the data, after removal of small-scale fluctua- 
tions, which owe their origin to random processes. 

Extraction of trend through discrete wavelets. — For 
this purpose, we make use of Coiflets, a family of discrete 
wavelets, known to be ideally suited for capturing the 
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10/14/1998 8/10/1999 6/5/2000 4/1/2001 1/26/2002 11/22/2002 

897 Daily Observations: 18 th February 1999 - 12 ,h September 2002 

FIG. 1: (Color Online) NASDAQ composite index (black line) 
and the trend extracted through Coiflets (red line) superim- 
posed over it. The near matching of the two demonstrates 
the superb ability of the Coiflets in extracting the trend. 



trend in a data set [Tjj . Discrete wavelets provide a com- 
plete orthonormal basis, which separates out the aver- 
age (low-pass) part of a signal from the variations (high- 
pass). The orthonormal basis consists of scaling function 
(father wavelet) and mother and daughter wavelets. The 
scaling function finds out the trend in a data set, whereas 
the wavelets identify the variations at different scales. 
This is possible since the wavelets have multi-resolution 
abilities. Coiflets are compactly supported, with filter 
length 6N, where N is the order of the wavelet. These are 
nearly symmetrical and hence introduce minimal distor- 
tion while capturing the trend. Father function has 2N-1 
vanishing moments, while mother and daughter wavelet 
functions have 2N vanishing moments; this endows them 
with the property of picking out the trend in an ideal 
manner. 

The time series of the trend was subjected to the ge- 
netic algorithm. The first 700 points were used for train- 
ing the algorithm and the remaining points were used for 
forecast verification. The genetic evolution process was 
initiated with 200 randomly selected equation strings, 
with m set to 4. m was found through false nearest 
neighbor approach that yielded the value of embedding 
dimension as 4 01 . It was found that 5000 iterations 
were needed for extracting the forecast equation from the 
set. Interestingly, the analytic expression involves only 
three out of the four input parameters: 



x fit (t) = Pf it [x(t - l),x(t - 3),x(t 
x(t - 4) * {x{t- l)) 2 
(^-3)) 2 • 



4)] 



(2) 



Note that Xfuit) is the prediction of t th day and x(t ~ 
1), x{t — 3) and x(t — 4) are the trends on the (t — 1), (t — 3) 
and (t — 4) days respectively. 



The above number of iterations were sufficient in the 
sense that increasing the number of iterations did not 
lead to a significant increase in the fitness strength: 



R z = 1 



A 2 



(3) 



where, A 2 = J2(x c — x ) 2 , x c is a parameter value 
estimated by the best scoring equation, xq is the corre- 
sponding "true" value, (xq) is the mean of the "true" 
values of x. 

In Fig.l„ we show the time series of the data and the 
trend extracted through Coiflets-2, after removal of four 
level of high-pass coefficients. Retaining a few domi- 
nant high-pass coefficients, through other thresholding 
approaches like Donoho, did not have a significant effect 
here. More level of decomposition and thresholding was 
found to be detrimental to the prediction, since the same 
removes significant physical variations from the data set. 
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197 Observations: 6 th December 2001 - 12'" September 2002 

FIG. 2: (Color Online) A zoomed portion of the NASDAQ 
composite index trend superimposed on predicted trend. Here 
the black line represents actual trend and the red circle symbol 
the predicted trend. 

Next we made predictions for 747 (out of sample) 
points which were not used in training. It is worth point- 
ing out that, since the embedding dimension m=4, we are 
left with 743 input-output relations. A zoomed portion 
of the same is depicted in Fig. 2. Fig. 3 shows the scat- 
ter plot of the predicted and actual trend, correspond- 
ing to time period between 30" 1 November 2001 to 17 th 
November 2004, showing a perfect fit. The equation of 
the best-fit straight line is, y — 1.00039a; — 0.59235 and 
the coefficient of determination (square of the coefficient 
of correlation) was found to be 0.99994, signifying a high 
degree of correlation. 

The efficacy of the prediction is further illustrated 
by the mean error through the average of modulus of 
return (in percent) which is found to be 0.06831 per- 
cent. Considering the fact that the signature of the 
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trend (rise or fall) is of significance for financial time 
series we have further calculated the following quantity: 
sign(xi + i — Xj) * sign(y i+ i — y,-) where, i goes from 1 
to 742. The rise and fall are very well captured by the 
prediction equation (Eq.Q), since the number of mis- 
matches are only 40 out of 742. 

Self- similar dynamics of financial time series. — We 
have analyzed power spectrum of the trend, to ascertain 
the nature of this dynamical system and efficacy of the 
wavelets in removing small scale fluctuations, ft is found 
that, the power spectrum shows self-similar behavior; the 
dynamics have both integrable and chaotic components. 
As seen in Fig. 4a the power spectrum of Fourier trans- 
form of the trend has a power law decay with exponent 
1 .88. It has been recently shown that the corresponding 
exponent for chaotic dynamics is one, whereas the inte- 
grable systems have an exponent two [KJ, 120, |2fJ . Hence, 
the trend of stock market dynamics in the present case 
is closer to an integrable system. 
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FIG. 4: Power spectrum of the Fourier transform of the trend, 
in a log-log scale. Linear fit (y = -1.88043a: - 4.26722) indi- 
cates the closeness of the dynamics to integrable systems, for 
which the slope is -2. 
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FIG. 3: (Color Online) The scatter plot of the predicted and 
actual trend shows a linear behavior, illustrating the efficacy 
of the algorithm. Plus symbol represents scatter plot of the 
above, red line represents its linear fitting. 

We have further checked the ability of the present al- 
gorithm using average of daily closing prices of BSE 30 
index. This consists of 30 blue chip companies traded 
on the Bombay Stock Exchange. The compilation of 
the values is based on the 'weighted aggregates' method. 
It is remarkable that the same equation for the trend 
as in Eq.J2J) fits the trend of the BSE index, extracted 
through Coiflcts, extremely well, as seen in Fig. 5. This 
indicates the underlying similarity between the dynami- 
cal processes governing both the composite indices. The 
mean error found in prediction of BSE index was 0.08693 
percent. 

In conclusion, a technique has been developed for pre- 
dicting NASDAQ and other composite indices' trend us- 
ing the modern powerful genetic algorithm and discrete 
wavelet transform. The algorithm uses the past values 



FIG. 5: (Color Online) A zoomed portion of BSE Sensex 
trend and its prediction. Black line represents trend, red circle 
symbol represents the predicted trend. 

of the trend extracted through wavelets for carrying out 
the prediction and is based on the Darwinian theory of 
survival of the fittest equation strings. The major ad- 
vantage of using genetic algorithm versus other nonlin- 
ear forecasting techniques like neural networks is that an 
explicit analytic expression for the dynamic evolution of 
the trend in the time series is obtained. 

It is quite possible that the technique will prove to be 
successful in forecasting the trend of individual stocks. 
The fluctuation in financial time series also show power 
law behavior indicating self similar nature |22l l23j . it 
would be worthwhile to study their characteristic through 
this formalism. The fluctuations at very small scale, will 
not be amenable for modeling because of their random 
origin. Making use of wavelets, one can separate fluctu- 
ations at higher scales for the possibility of prediction. 
Work in these directions are in progress and will be re- 
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ported in future. 

The authors are indebted to Dr. A. Alvarez for gener- 
ously providing the computer code of genetic algorithm 
used in the study. 
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